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(54) Abstract Title 

A system for video, audio and graphic presentation in tandem with video/audio play 

(57) The present invention is a method for the 
coordination and display of graphics or play out of audio 
or video in conjunction with a multimedia presentation. 
Graphics objects such as text or sprites, which may be 
animated, are displayed by the viewer's equipment, 
which is typically an advanced television or set-top box, 
which plays the video and audio. Graphics objects are 
displayed at locations which are complementary to the 
locations of objects in the video. Similarly, audio or video 
clips, stored in the set-top box, or embedded In the 
stream can be played at appropriate times and screen 
locations in the presentation called video and audio 
"holes". Data describing these "holes', and other control 
information is embedded in the video stream and 
extracted by the viewer's STB for use in coordinating 
tandem play. 
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A SYSTEM FOR VIDEO, AUDIO, AND GRAPHIC PRESENTATION 
IN TANDEM WITH VIDEO /AUDIO PLAY 

The present invention relates to displaying the content of audio, 
video, and graphic units in tandem with a multimedia presentation having 
holes indicating predefined locations and times for the display of the 
audio, video, and graphic units. 

Many video applications, including interactive and multimedia 
applications, take advantage of the video viewer's equipment capability 
to display graphics overlays on the video screen such as a TV or a PC 
monitor. These graphics displays either dominate the entire screen, as 
in the case of many electronic program guides or menus, or sections 
thereof. The video behind these graphic overlays is entirely or 
partially obscured, thereby interfering with the viewing experience. 
Systems for the presentation of electronic program guides, such as 
described in U.S. Patent Nos. 5,737,030, 5,592,551, 5,541,738, and 
5,353,121, display these guides either on a screen devoid of video or one 
which uses a still frame or moving video simply as a background, with no 
coordination between the location of items in the video and the location 
of graphics overlays. 

Currently, Viewers' equipment, such as set-top boxes (STB), does 
not have the capability to determine where objects are located in the 
video. Determination of an object's location in a video is necessary in 
order to place the graphics objects, such as the on-screen text or 
animated characters, in locations which do not interfere with objects 
appearing in the video presentation. 

Systems such as the one described in U.S. Patent No. 5,585,858 
attempt to coordinate video and graphic displays by including in the 
broadcast stream, or pre-storing at the viewers' equipment, graphic 
overlay screens designed to be compatible with the video content. 
However, these screens must be created well in advance of the 
presentation, and thus lack the flexibility to create and display 
non-interfering graphics overlays adapt ively. In addition, those systems 
display graphics at specific "trigger points" in the presentation, not at 
arbitrary points throughout the presentation. 

Other systems which add graphics or audio content to an existing 
presentation, such as described in U.S. Patent No. 5,708,764, require the 
active participation of the viewer in the process of presentation. The 
viewer, for example, may be required to answer a number of questions 
before or during the presentation, the responses are then displayed on 
the screen at predetermined times. 



Systems which allow the personalization of content for individual 
users are well known in the context of Web browsing. Other systems, such 
as systems described in U.S. Patent Nos . 5,585,858 and 4,616,327, provide 
a limited number of introductions, by the viewers' equipment of 
predetermined text or graphics. Some systems, such as described in U.S. 
Patent Nos. 4,839,743, 4,786,967, and 4,847,700, provide audio and/or 
video personalization through the selection among a small number of 
alternate video and audio tracks which are broadcast simultaneously. The 
selection is performed at the viewer's equipment. 

Accordingly the invention provides a method for displaying content 
of audio, video, and graphic units in tandem with a multimedia 
presentation having holes indicating predefined locations and times for 
the display of said audio, video, and graphic units, the method 
comprising: communicating a multimedia presentation stream to a 
receiving device; determining if said multimedia presentation stream 
includes holes information embedded therein; extracting said holes 
information; and displaying said audio, video, and graphic units in 
tandem with said multimedia presentation in said holes of said multimedia 
presentation . 

According to the preferred embodiment the location and timing of 
video objects and audio events are made available to the viewers' display 
equipment. This gives that equipment the flexibility to add 
non-interfering graphics or audio when and where it sees fit, in an 
adaptive manner throughout a presentation, rather than at limited points. 
This allows the viewers' equipment to create a tandem 
video/audio/graphics presentation without requiring viewers' active 
participation in the presentation process. A preferred embodiment of the 
present invention allows coordination of graphics content that is not 
pre-stored, such as broadcast news bulletins, and performs still or 
animated graphics overlay of video, addition or replacement of video, and 
audio replacement in coordination with the existing video and audio 
content of a presentation. 

According to the preferred embodiment a system is provided for the 
definition and use of information which enables the display or playing of 
audio, video or graphics objects in tandem with the video and audio play 
of a digital video presentation. The presentation thus enhanced may be 
available via a broadcast or in a video-on-demand scenario. The video 
distribution system over which the video is made available can be a 
one-way system, such as a terrestrial television broadcast, or a two-way 
communication, such as a hybrid fiber /coaxial cable system with return 
channel capability. 



According to the preferred embodiment the tandem presentation of 
additional audio, video, or graphics is made possible by defining video 
and audio holes in the video or audio presentation at which there is no 
significant video or audio activity. Holes are locations and times in 
the video presentation. Graphics or audio objects are appropriately 
presented by the STB in those holes. The STB is notified as to the 
location and/or times associated with these holes, as well as other 
information which characterizes the material which the STB must present. 

With this information, the STB is able to judiciously place 
graphics objects on screen or play audio or video content, and avoid 
interference with video objects or audio events. The graphics objects 
displayed by the STB can be static or dynamic, i.e., animated. Thus, a 
preferred embodiment of the present invention also enables the creation 
of video presentations in which objects in the original video or 
animation interact and move in tandem with video or graphics objects 
which are added by the viewer's equipment. For example, a cartoon may be 
created in which several characters are seen on screen at once and a hole 
is left for the addition of an animated character which is added by the 
viewer's equipment such as an STB. 

Alternatively, the hole could be defined at the location of a 
relatively less important character which can be obscured by the 
STB-animated character. The viewers whose STB does not support the 
present invention will still be able to see a presentation with no video 
holes. The information as to what type of character can be added, at 
what screen locations, at what times, and optionally, the motion of the 
added character is delivered to the STB in advance of the display of the 
character . 



Similarly, a preferred embodiment of the present invention allows 
tandem audio play between the audio content of the presentation and audio 
content which is introduced by the STB. 

The preferred embodiment allows for the personalization of the 
video, graphics or audio content introduced by the STB. The 
personalization is achieved by a viewer when he or she specifies several 
personal parameters, such as name and age through a viewer interface. To 
continue the above example, a child's name may be entered in the STB's 
personalization information. When viewing the prepared presentation, the 
STB-animated character can display this child's name, when this character 
is presented in the location of video holes. Alternatively, the STB can 
play an audio clip of the child's name during audio holes. Personalized 
audio or video clips may be recorded and stored in the STB for use in the 
tandem play. 



Thus, a preferred embodiment of the present invention allows a 
single version of material such as a cartoon presentation to be created 
and broadcast, yet be viewed and heard differently by various viewers, 
and tailored to them specifically. A hybrid presentation is in effect 
created, the sum of the original presentation and the graphics and/or 
audio which is introduced by the viewers' STB into the holes. 

Accordingly, personalization information, audio and video segments 
and possibly hole information are stored in the STB. The STB receives a 
multimedia presentation stream embedded with hole information. The hole 
information is embedded into the stream during an authoring stage, where 
the creator of the presentation determines the hole locations and times. 
That hole information is extracted on the STB, and audio and video 
segments and personalization information previously stored on the STB, 
are coordinated with the holes and displayed in tandem with the 
multimedia presentation. 

In a further aspect, the invention provides a computer program 
device readable by a machine, tangibly embodying a program of 
instructions executable by a machine for displaying content of audio, 
video, and graphic units in tandem with a multimedia presentation having 
holes indicating predefined locations and times for the display of said 
audio, video, and graphic units, said program comprising: means for 
communicating a multimedia presentation stream to a receiving device; 
means for determining if said multimedia presentation stream includes 
holes information embedded therein; means for extracting said holes 
information; and means for displaying said audio, video, and graphic 
units in tandem with said multimedia presentation in said holes of said 
multimedia presentation. 

In a yet still further aspect, the invention provides apparatus for 
defining the display of content of audio, video, and graphic units in 
tandem with a multimedia presentation having holes indicating predefined 
locations and times for the display of said audio, video, and graphic 
units, the apparatus comprising: means for receiving a multimedia 
presentation stream; means for determining if said multimedia 
presentation stream includes holes information embedded therein; means 
for extracting said holes information; and means for transferring said 
audio, video, and graphic units in tandem with said multimedia 
presentation in said holes of said multimedia presentation to a device 
for display thereof. 

A preferred embodiment of the present invention will now be 
described in detail, by way of example only, and with reference to the 
following drawings: 



Figure 1 is a view of a monitor screen displaying an animated 
presentation with the location of a video hole indicated. 



Figure 2 is the view of the same screen as Figure 1, with the 
addition of an STB-animated character in the video hole location. 

Figure 3 is a flowchart showing steps involved in extracting and 
processing a hole information from a multimedia presentation stream 
according to a preferred embodiment of the present invention. 

Figure 4 shows equipment necessary for the extraction of hole 
information and display of tandem content according to a preferred 
embodiment of the present invention. 

The steps necessary to prepare and to play a presentation with 
tandem STB video graphics display and/or audio or video play according to 
a preferred embodiment of the present invention include: 

1. defining video and audio holes during an authoring stage and 
embedding them as part of control information in the presentation stream 
with video and audio; 

2. performing personalization on viewer's STB; 

3. delivering the presentation stream to viewer's STB; 

4. extracting the control information from the presentation stream 
and parsing by the STB; and 

5. displaying video and audio of the presentation stream together 
with graphics, audio, or video objects provided by the STB during the 
time and location of the holes. 

AUTHORING STAGE 

According to the preferred embodiment, in order to specify the 
location and time of video and audio holes, a video presentation is 
marked with control information. This is done offline, through the use 
of an authoring system designed for this marking process and described in 
U.S. Patent Application No. 09/032,491. 

In an alternative preferred embodiment, the control information is 
added in real time to a live presentation in progress, by specifying 
video holes to the STB. The STB uses this information to display text 
associated with the program, e.g., news or a sports program, and 
broadcast along with the video and audio. The choice of text for display 
can be based on personalization information already stored in the STB. 
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In the preferred embodiment, the authoring system accepts as input 
video/audio content. An author steps through the content, marking 
locations of video and/or audio holes. The markings thus created are 
used by the authoring system to create control information describing 
5 these holes, which is inserted into the video/audio content. 

In the preferred embodiment, the control information takes the form 
of HTML tags which indicate: 

1. hole identifier used to coordinate hole with insertion 

10 application. 

2. hole type, e.g., video or audio, 

3. beginning time of hole, 

4. ending time of hole, 

5. beginning screen location of hole, e.g., x, y coordinates in 

15 video, 

6. ending screen location of hole e.g., x, y coordinates in video, 

7. motion vector for hole movement in video, 

8. description of bitmap (s) to be insert in video hole, and 

9. volume level for inserted audio. 

20 

An automatic object recognition is incorporated into the authoring 
system to simplify the authoring process. An author specifies the 
initial location of a video object, e.g., a less-significant character, 
and its subsequent locations are detected by the authoring system, which 
25 inserts appropriate control information into the stream as the object 

moves . 

For digital video streams, the Motion Pictures Experts Group 
(MPEG-2) compression for audio and video signals, and MPEG-2 Systems 

30 transport for the transport of those signals may be used. A compression 

method is usually applied to a video before transmission over a network 
because of the high bit rate requirements of digital video. In the 
preferred embodiment, video and audio content are compressed using MPEG-2 
compression, as specified in ISO/IBC 13818-2 for video and ISO/IEC 

35 13818-3 for audio. 

The MPEG-2 standard also specifies how presentations consisting of 
audio and video elementary streams can be multiplexed together in a 
•transport stream-. This is specified in the MPEG-2 Systems 

40 Specification, ISO/IEC 13818-1. The MPEG-2 Systems Specification 

accommodates the inclusion in a presentation's transport stream of 
non-video and non-audio streams, by use of -private data" streams. All 
transport stream packets, regardless of content, are of a uniform size 
(188 bytes) and format. "Program-Specific Information", which is also 

45 carried in the transport stream, carries the information regarding which 

elementary streams have been multiplexed in the transport stream, what 



type of content they carry, and how they may be demultiplexed. In the 
preferred embodiment, the control information is carried in an MPEG-2 
Transport Stream private data stream. 

In the preferred embodiment, beginning and ending times for hole 
specification are specified in terms of the Presentation Time Stamp (PTS) 
of the frames where the hole appears. PTSs are typically present in 
every frame to every third frame, and this is sufficient for 
synchronization, since the frame rate for NTSC video is 30 frames /second. 
Video holes are rectangular, and thus specified by a pair of (x, y) 
coordinates. Other embodiments may use more complex polygons to describe 
video hole shape, and require more coordinates and a specification of 
which polygon is to be used. The video hole movement is linear between 
the beginning and ending screen location. Again, more complex functions 
may be specified in other embodiments to describe video hole movement. 

DELIVERY STAGE 

According to the preferred embodiment, the control information is 
expressly created to describe holes left in the video and/or audio for 
insertion of the content by the STB. In order to show a full 
presentation to those viewers whose STB does not support a preferred 
embodiment of the present invention, holes may actually be a default unit 
of video or audio content. Presentations which were not designed for a 
preferred embodiment of the present invention may be retrofitted to 
accommodate it, i.e., holes may be found in the existing content areas 
and/or sounds which can be overlaid. 

According to the preferred embodiment, after forming the control 
information, the video presentation together with such control 
information may be transported to the viewer's STB by being sent: 

a. in the video blanking interval of an analog video signal and 
extracted by the viewers' equipment in a manner similar to that used for 
closed-capt ion information; 

b. in a separate Vestigial ! Side Band channel; 

c. within a digital video/audio stream, and extraction of embedded 
data is performed by the viewers' equipment in a manner similar to that 
used for the extraction of video or audio streams. 

THE STB 

Figure 4 shows typical equipment necessary for performing a 
preferred embodiment of the present invention. It comprises a television 
set or a monitor screen 4, cable 6 to receive the multimedia 
presentation, and the STB 5 to accept, process and to forward the 
resulting presentation over cable 7, to be displayed on the monitor 
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screen 4. MPEG-2 demultiplexers, MPEG-2 audio decoders and MPEG-2 video 
decoders are now widely available. The C-Cube C19110 Transport 
Demultiplexer, C-Cube C19100 MPEG-2 Video Decoder, and Crystal 
Semiconductor CS4920 MPEG Audio Decoder are examples. In the preferred 
embodiment, the video and audio decoders are implemented together in a 
single chip, such as the IBM CD21 MPEG-2 Audio/Video decoder. If not 
incorporated in the audio and video decoder, an intermediate IC is used 
at the output of the decoders to convert from digital to analog and, in 
the case of video, encode to the desired video analog signal format such 
as NTSC, PAL, or SECAM. S-video output from these IC's is optional. 

In the preferred embodiment, the on-screen graphics objects which 
overlay video content are rendered using the on-screen display (OSD) 
functions of the MPEG-2 Video Decoder in the STB. These decoders vary in 
the sophistication of the OSD which they offer and in the application 
program interfaces (API) which are used to control the OSD. Individual 
pixels can be addressed, and bitmaps are used for many text and graphic 
objects. A minimum level of OSD graphics capability offers 16 colours. 
A preferred capability offers 256 colours and multi-level blending 
capability. The blending capability of the OSD allows for varying 
degrees of opacity for the graphics overlay. 

Overlay of audio content is performed by the STB audio decoder in 
the case of MPEG audio or by the STB processor utilizing an API to a 
media player. File formats supported by this player include ".wav", 
".rmi B , and ".mid-. Alternatively, the audio playing function can be 
incorporated into the STB's application itself. 

Video replacement or addition can be performed by an additional 
video decoder in the STB. Systems with "picture-in-picture B capability 
can use this feature for addition or replacement of video objects. 

In either case, the audio being played is mixed with or pre-empts 
the original audio of the presentation, utilizing the STB's audio output. 
In another embodiment, one in which two tracks of audio are available, 
one for music and one for dialogue, the STB replaces the content of the 
latter track while allowing the former to continue as usual. 

According to the preferred embodiment, the presentation, which is 
to be viewed, is broadcast using the NTSC or PAL for analog or ATSC or 
DVB for digital television standards. In another embodiment, the 
presentation is viewed and controlled on a per-users basis, as with a 
video-on-demand systems or viewing from a video tape. 

The processing power needed to implement a preferred embodiment of 
the present invention can be easily accommodated by the processing 



capabilities of the processors in roost current STB's, which start at 
roughly 1 MIP. This processor runs the video/audio content insertion 
application, and controls the use of the OSD and audio functions. 

An STB 5 typically has between 1 and 4 MB of RAM. The program of 
a preferred embodiment of the present invention is downloaded to or 
stored in the RAM of the STB, and occupies approximately up to 0.5 MB. 

Only a small amount of the STB 5 storage is required to store 
personalization information for all viewers in a household. In the 
preferred embodiment, personalization information for each viewer 
includes : 

1 . name , 

2. age, 

3. content restrictions, e.g., PG-13, 

4. text preference, e.g., large type, 

5. enable audio replacement, 

6. enable video replacement, and 

7. pointer to sprite associated with viewer. 

This information is stored in non-volatile memory in order to 
persist when the viewers' STB is powered off or during power failures. 
Typical STB's have non-volatile RAM for this purpose. 

Figures 1 and 2 provide example screen displays according to a 
presentation prepared initially for a tandem play. Figure 1 shows a 
screen 10 of an animated program with one video character 20. The 
location of a hole 30 is indicated by dotted lines 40. The dotted lines 
40 around the hole 30 are only illustrative, and would not appear in the 
actual program. Control information concerning the location of the hole 
30 is embedded in the video stream and extracted by the STB. 

Figure 2 shows the same screen with the addition of an STB-animated 
character 50 which is displayed in the location of a hole 30. 
Alternatively, the STB could have used the hole 30 for display of 
graphics text describing the character, for example. 

It is also possible to prepare for a presentation utilizing a 
mechanism that looks for locations of holes 3 0 which occur naturally in 
the audio and video presentation. Alternatively, holes 30 may be created 
in a presentation by blanking out sections of the existing audio track or 
obscuring sections of the video screen. 

The logical flow of the application which is loaded into the STB 
and used to parse control data of the video presentation stream and to 
display information stored in the STB in the holes 3 0 of the 
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presentation, is shown in Figure 3. According to the preferred 
embodiment, the Program Specification Information (PSI) of the current 
presentation is parsed at step 80. A determination is made at step 81 
whether any control information with holes locations will be arriving 
5 with this presentation. If the information will not be arriving, the 

program control returns to step 80, and the next presentation will be 
parsed. If the information will be arriving, then at step 82 
demultiplexer queues are setup to receive it. At step 83, a 
determination is made whether the control data has arrived in 

10 demultiplexer queues, if not, the test at step 83 is repeated. When the 

information has arrived at the queues, it is parsed at step 84 to 
ascertain the HTML tags. At step 85 the HTML tags are matched with the 
hole information. If there is no match, the program control returns to 
step 83. If there is a match, step 86 assigns the received data to 

15 associated variables, and returns program control to step 83. 

When all the information about holes and the overlay information is 
parsed and assembled in the STB, then it becomes a straight forward, 
commonly known task of the STB to overlay content at given hole 
20 coordinates with overlay data while displaying the presentation stream on 

a video monitor. A similar process applies to audio holes. 

A preferred embodiment of the present invention relating to the 
display of graphics objects such as text or sprites overlaying a 
25 multimedia television presentation, and more specifically to the display 

of animated graphics or play out of video or audio coordinated with a 
multimedia presentation, has been described herein. 



CLAIMS 



1. A method for displaying content of audio, video, and graphic units 
in tandem with a multimedia presentation having holes indicating 
predefined locations and times for the display of said audio, video, and 
graphic units, the method comprising: 

communicating a multimedia presentation stream to a receiving 
device; 

determining if said multimedia presentation stream includes holes 
information embedded therein; 

extracting said holes information; and 

displaying said audio, video, and graphic units in tandem with said 
multimedia presentation in said holes of said multimedia presentation. 

2. The method of claim 1, wherein said audio, video, and graphic units 
are stored in said receiving device; 

3. The method of claim 1, wherein said audio, video, and graphic units 
are communicated with said multimedia presentation. 

4. The method of claim 1, 2 or 3, wherein said holes information is 
determined and embedded in a multimedia presentation stream in an 
authoring step prior to the communication step. 

5. The method of any preceding claim wherein said holes information is 
allowed to be altered in said receiving device via a user interface. 

6. The method of any preceding claim, wherein said holes information 
includes: an identifier for coordination with insertion application, 
media type, beginning time, ending time, beginning screen location, 
ending screen location, motion vector for movement in video, description 
of a bitmap if said video is to be inserted, and volume level if audio is 
to be inserted. 

7. The method of any preceding claim, wherein said holes information 
is defined in such a way that displaying of said audio, video and 
graphics units will not interfere with viewing of said multimedia 
presentation. 

8. The method of any preceding claim, wherein said holes information 
is defined in coordination with visible objects in said multimedia 
presentation. 

9. The method of any preceding claim, wherein said holes information 
is defined in such a way that audio play can be performed without 
interfering with the sound of said multimedia presentation. 
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10. The method of any preceding claim, wherein said holes information 
is defined in such a way that audio units can be introduced in 
coordination with the audio units of said multimedia presentation. 

11. The method of any preceding claim, wherein said holes information 
is used in displaying said audio, video, and graphic units in such a way 
as not to interfere with the viewing or hearing of said multimedia 
presentation . 

12. The method of any preceding claim, wherein said holes information 
is used in displaying said audio, video, and graphic units which are 
coordinated in content with an audio, a video and a graphic units of said 
multimedia presentation, forming a hybrid of coordinated presentation 
from the conjunction of said multimedia presentation content and content 
of said audio, video, and graphic units. 

13. The method of any of claims 5 to 12, wherein personalization 
information is stored in said receiving device via said user interface. 

14. The method of claim 13, wherein said personalization information 
includes: said viewer's name, said viewer's age, content restriction for 
said viewer, text preference, audio replacement enablement switch, video 
replacement enablement switch, and a pointer to a sprite associated with 
a viewer . 

15. A computer program device readable by a machine, tangibly embodying 
a program of instructions executable by a machine for displaying content 
of audio, video, and graphic units in tandem with a multimedia 
presentation having holes indicating predefined locations and times for 
the display of said audio, video, and graphic units, said program 
comprising: 

means for communicating a multimedia presentation stream to a 
receiving device; 

means for determining if said multimedia presentation stream 
includes holes information embedded therein; 

means for extracting said holes information; and 

means for displaying said audio, video, and 

graphic units in tandem with said multimedia presentation in said 
holes of said multimedia presentation. 

16. The computer program device of claim 15, wherein said audio, video, 
and graphic units are stored in said receiving device. 

17. The computer program device of claim 15, wherein said audio, video, 
and graphic units are communicated with said multimedia presentation. 
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18. The computer program device of claim 15, 16 or 17, the program 
further comprising: 

authoring means for determining hole information and for embedding 
said hole information in a multimedia presentation stream prior to the 
communication of said multimedia stream to a receiving device. 

19. The computer program device of any of claims 15 to 18 wherein said 
holes information is allowed to be altered in said receiving device via a 
user interface. 

20. The computer program device of claims 15 to 19, wherein said holes 
information includes: an identifier for coordination with insertion 
application, media type, beginning time, ending time, beginning screen 
location, ending screen location, motion vector for movement in video, 
description of a bitmap if said video is to be inserted, and volume level 
if audio is to be inserted. 

21. The computer program device of any of claims 15 to 20, wherein said 
holes information is defined in such a way that displaying of said audio, 
video and graphic units will not interfere with viewing of said 
multimedia presentation. 

22. The computer program device of claims 15 to 21, wherein said holes 
information is defined in coordination with visible objects in said 
multimedia presentation. 

23. The computer program device of any of claims 15 to 22, wherein said 
holes information is defined in such a way that audio play can be 
performed without interfering with the sound of said multimedia 
presentation. 

24. The computer program device of any of claims 15 to 23, wherein said 
holes information is defined in such a way that audio units can be 
introduced in coordination with the audio units of said multimedia 
presentation. 

25. The computer program device of any of claims 15 to 24, wherein said 
holes information is used in displaying said audio, video, and graphic 
units in such a way as not to interfere with the viewing or hearing of 
said multimedia presentation. 

26. The computer program device of any of claims 15 to 25, wherein said 
holes information is used in displaying said audio, video, and graphic 
units which are coordinated in content with an audio a video and a 
graphic units of said multimedia presentation, forming a hybrid of 
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coordinated presentation from the conjunction of said multimedia 
presentation content and content of said audio, video, and graphic units. 

27. The computer program device of any of claims 19 to 26, wherein 
personalization information is stored in a receiving device via said user 
interface. 

28. The computer program device of claim 27 f wherein said 
personalization information includes: said viewer's name, said viewer's 
age, content restriction for said viewer, text preference, audio 
replacement enablement switch, video replacement enablement switch, and a 
pointer to a sprite associated with a viewer. 

29. Apparatus for defining the display of content of audio, video, and 
graphic units in tandem with a multimedia presentation having holes 
indicating predefined locations and times for the display of said audio, 
video, and graphic units, the apparatus comprising: 

means for receiving a multimedia presentation stream; 

means for determining if said multimedia presentation stream 

includes holes information embedded therein; 

means for extracting said holes information; and 

means for transferring said audio, video, and graphic units in 

tandem with said multimedia presentation in said holes of said multimedia 

presentation to a device for display thereof. 
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